-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docsp-31169 - change stream schema inference #164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few things
source/read-from-mongodb.txt
Outdated
@@ -42,6 +42,18 @@ Overview | |||
|
|||
.. include:: /scala/filters.txt | |||
|
|||
.. important:: Change Stream Schema Inference | |||
|
|||
When the {+driver-short+} infers the schema of a data frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: should we rename this source constant since the product is not a driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love it
source/configuration/read.txt
Outdated
| When set to ``true``: | ||
|
||
- The connector filters out messages that | ||
omit the ``fullDocument`` field and only publishes the value of the | ||
field. | ||
- If you don't specify a schema, the connector infers the schema | ||
from the change stream document rather than from the underlying collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I: per the style guide re: lists, do not start a list with a fragment
| When set to ``true``: | |
- The connector filters out messages that | |
omit the ``fullDocument`` field and only publishes the value of the | |
field. | |
- If you don't specify a schema, the connector infers the schema | |
from the change stream document rather than from the underlying collection. | |
| The connector displays the following behavior when this property is set to ``true``: | |
- The connector filters out messages that | |
omit the ``fullDocument`` field and only publishes the value of the | |
field. | |
- If you don't specify a schema, the connector infers the schema | |
from the change stream document rather than from the underlying collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch
source/read-from-mongodb.txt
Outdated
read from a change stream, by default, | ||
it will use the schema of the underlying collection rather than the schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S: Remove the "by default" here. It is implicit as the next sentence explains how to change the behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, but to me it benefits from some indication that this behavior can be changed (apart from the following sentence, I mean). Especially when this isn't the same page where this option is documented, so the reader might not be in the mindset of settings they can turn on or off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me!
source/read-from-mongodb.txt
Outdated
When the {+driver-short+} infers the schema of a data frame | ||
read from a change stream, by default, | ||
it will use the schema of the underlying collection rather than the schema | ||
of the change stream. To instruct the connector to use the schema of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the {+driver-short+} infers the schema of a data frame | |
read from a change stream, by default, | |
it will use the schema of the underlying collection rather than the schema | |
of the change stream. To instruct the connector to use the schema of the | |
When the {+driver-short+} infers the schema of a data frame | |
read from a change stream, by default, | |
it will use the schema of the underlying collection rather than of the change stream. | |
To instruct the connector to use the schema of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used your suggestion plus "that" to hopefully make it more clear. LMK
source/read-from-mongodb.txt
Outdated
of the change stream. To instruct the connector to use the schema of the | ||
change stream, set the ``change.stream.publish.full.document.only`` option | ||
to ``true``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of the change stream. To instruct the connector to use the schema of the | |
change stream, set the ``change.stream.publish.full.document.only`` option | |
to ``true``. | |
of the change stream. The connector infers the schema from the | |
change stream if you set the ``change.stream.publish.full.document.only`` option | |
to ``true``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworded. just for my info, what don't you like about the first version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrasing of "instruct the connector" seemed needlessly complicated. Better to just have it be a declarative sentence, IMO. but it just a suggestion :)
source/read-from-mongodb.txt
Outdated
For more information on configuring a read operation, see the | ||
:ref:`spark-read-conf` guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S: make this link more specific about the change read read options and link here https://docs-mongodbcom-staging.corp.mongodb.com/spark-connector/docsworker-xlarge/docsp-31169-infer-schema/configuration/read/#change-streams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's what i had initially. not sure why i changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
* Spark Connector Minor versions 10.2 * Delete settings.json Deleting VS code settings
Co-authored-by: Caitlin Davey <caitlin@caitlindavey.com> (cherry picked from commit 099521a)
Pull Request Info
PR Reviewing Guidelines
JIRA - https://jira.mongodb.org/browse/DOCSP-31169
Staging:
Self-Review Checklist